Comparative Study of Kernel Based Classification and Feature Selection Methods With Gene Expression Data

نویسنده

  • Mingyue Tan
چکیده

Gene expression profiles obtained by high-throughput techniques such as microarray provide a snapshot of expression values of up to ten thousands genes in a particular tissue sample. Analyzing such gene expression data can be quite cumbersome as the sample size is small, the dimensionality is high, and the data are occasionally noisy. Kernel methods such as Support Vector Machines (SVMs) [5, 45] have been extensively applied within the field of gene expression analysis, and particularly to the problems of gene classification and selection. In general, kernel methods outperform other approaches due to their ability to handle high dimensionality easily. In this thesis, we perform a comparative study of various state-of-the-art kernel based classification and feature selection methods with gene expression data. It is our aim to have all the results together in one place so that people can easily see their similarities and differences both theoretically and empirically. In the literature, a feature selection method is evaluated by the classification accuracies using the features selected by the method. This evaluation criterion measures the classification capabilities of the data after the elimination of irrelevant features. We propose another criterion, called stability, to evaluate the feature selection methods in addition to classification accuracies. The feature set selected by a stable feature selection algorithm should not change significantly when some small changes are made to the training data. In this thesis, we use both of two evaluation criteria to compare feature selection methods. It has been showed that cross validation technique can be used to improve feature selection methods in terms of classification accuracies [8]. In this thesis, we extend one existing feature selection method which utilizes Gaussian Processes (GP) [47] with Automatic Relevance Determination (ARD) [28, 34], and cross validation, and propose a new feature selection method. Experiments on real gene expression data sets show that our method outperforms all other feature selection methods in terms of classification accuracies, and achieves comparable stability as Sparse Multinomial Logistic Regression (SMLR) [23], the most stable feature selection method in the literature.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine

We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...

متن کامل

Gene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method

Background: Microarray experiments can simultaneously determine the expression of thousands of genes. Identification of potential genes from microarray data for diagnosis of cancer is important. This study aimed to identify genes for the diagnosis of acute myeloid and lymphoblastic leukemia using a sparse feature selection method. Materials and Methods: In this descriptive study, the expressio...

متن کامل

Modeling and design of a diagnostic and screening algorithm based on hybrid feature selection-enabled linear support vector machine classification

Background: In the current study, a hybrid feature selection approach involving filter and wrapper methods is applied to some bioscience databases with various records, attributes and classes; hence, this strategy enjoys the advantages of both methods such as fast execution, generality, and accuracy. The purpose is diagnosing of the disease status and estimating of the patient survival. Method...

متن کامل

Prediction of blood cancer using leukemia gene expression data and sparsity-based gene selection methods

Background: DNA microarray is a useful technology that simultaneously assesses the expression of thousands of genes. It can be utilized for the detection of cancer types and cancer biomarkers. This study aimed to predict blood cancer using leukemia gene expression data and a robust ℓ2,p-norm sparsity-based gene selection method. Materials and Methods: In this descriptive study, the microarray ...

متن کامل

Diagnosis of Breast Cancer Subtypes using the Selection of Effective Genes from Microarray Data

Introduction: Early diagnosis of breast cancer and the identification of effective genes are important issues in the treatment and survival of the patients. Gene expression data obtained using DNA microarray in combination with machine learning algorithms can provide new and intelligent methods for diagnosis of breast cancer. Methods: Data on the expression of 9216 genes from 84 patients across...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006